To use this set of instructions you should always
#include
<mmx.h>
to get this instrinsics prototypes.
The following primitives are provided:
Support for AMD 3DNOW instruction
set
Example of a 3DNOW program in C
void _stdcall _pavgusb(_mmxdata *array1,_mmxdata *array2,int n);
The
PAVGUSB instruction produces the rounded averages of the eight unsigned 8-bit
integer values in the source operand (an MMX register or a 64-bit memory
location) and the eight corresponding unsigned 8-bit integer values in the
destination operand (an MMX register). It does so by adding the source and
destination byte values and then adding a 001h to the 9-bit intermediate value.
The intermediate value is then divided by 2 (shifted right one place) and the
eight unsigned 8-bit results are stored in the MMX register specified as the
destination operand. The PAVGUSB instruction can be used for pixel averaging in
MPEG-2 motion compensation and video scaling operations.
Numerical
Range for the PF2ID Instruction
void _stdcall
_pf2id(_mmxdata *array1,_mmxdata *array2,int n);
PF2ID is a vector instruction that
converts a vector register containing single-precision, floating-point operands
to 32-bit signed integers using truncation. The table below shows the numerical
range of the PF2ID instruction. The PF2ID instruction performs the following
operations:
IF (mmreg2/mem64[31:0] >= 2)
THEN mmreg1[31:0] = 7FFF_FFFFh
ELSEIF (mmreg2/mem64[31:0] <= –2)
THEN mmreg1[31:0] = 8000_0000h
ELSE mmreg1[31:0] = int(mmreg2/mem64[31:0])
IF (mmreg2/mem64[63:32] >= 2)
THEN mmreg1[63:32] = 7FFF_FFFFh
ELSEIF (mmreg2/mem64[63:32] <= –2)
THEN mmreg1[63:32] = 8000_0000h
ELSE
mmreg1[63:32] = int(mmreg2/mem64[63:32])
Source 2 |
Source 1 and destination |
0 |
0 |
Normal, abs(Source 1) <1 |
0 |
Normal,
–2147483648 < Source 1 <= –1 |
round
to zero (Source 1) |
Normal,
1 <= Source 1< 2147483648 |
round
to zero (Source 1) |
Normal,
Source 1 >= 2147483648 |
7FFF_FFFFh |
Normal,
Source 1 <= –2147483648 |
8000_0000h |
void _stdcall _pfacc(_mmxdata *array1,_mmxdata *array2,int n);
PFACC is a
vector instruction that accumulates the two words of the destination operand
and the source operand and stores the results in the low and high words of
destination operand respectively. Both operands are single-precision,
floating-point operands with 24-bit significands.
The PFACC
instruction performs the following operations:
mmreg1[31:0] =
mmreg1[31:0] + mmreg1[63:32]
mmreg1[63:32] =
mmreg2/mem64[31:0] + mmreg2/mem64[63:32]
void _stdcall _pfadd(_mmxdata *array1,_mmxdata *array2,int n);
PFADD is a
vector instruction that performs addition of the destination operand and the
source operand. Both operands are single-precision, floating-point operands
with 24-bit significands.
The PFADD
instruction performs the following operations:
mmreg1[31:0] =
mmreg1[31:0] + mmreg2/mem64[31:0]
mmreg1[63:32] =
mmreg1[63:32] + mmreg2/mem64[63:32]
void _stdcall _pfcmpeq(_mmxdata *array1,_mmxdata *array2,int n);
PFCMPEQ is a
vector instruction that performs a comparison of the destination operand and
the source operand and generates all one bits or all zero bits based on the
result of the corresponding comparison.
The PFCMPEQ
instruction performs the following operations:
IF (mmreg1[31:0] =
mmreg2/mem64[31:0])
THEN
mmreg1[31:0] = FFFF_FFFFh
ELSE mmreg1[31:0] =
0000_0000h
IF (mmreg1[63:32] =
mmreg2/mem64[63:32]
THEN
mmreg1[63:32] = FFFF_FFFFh
ELSE mmreg1[63:32] =
0000_0000h
void _stdcall _pfcmpge(_mmxdata *array1,_mmxdata *array2,int n);
PFCMPGE is a vector
instruction that performs a comparison of the destination operand and the
source operand and generates all one bits or all zero bits based on the result
of the corresponding comparison.
The PFCMPGE
instruction performs the following operations:
IF (mmreg1[31:0]
>= mmreg2/mem64[31:0])
THEN
mmreg1[31:0] = FFFF_FFFFh
ELSE mmreg1[31:0] =
0000_0000h
IF (mmreg1[63:32]
>= mmreg2/mem64[63:32]
THEN
mmreg1[63:32] = FFFF_FFFFh
ELSE mmreg1[63:32] =
0000_0000h
void _stdcall _pfcmpgt(_mmxdata *array1,_mmxdata *array2,int n);
PFCMPGT is a
vector instruction that performs a comparison of the destination operand and
the source operand and generates all one bits or all zero bits based on the
result of the corresponding comparison.
The PFCMPGT
instruction performs the following operations:
IF (mmreg1[31:0]
> mmreg2/mem64[31:0])
THEN
mmreg1[31:0] = FFFF_FFFFh
ELSE mmreg1[31:0] =
0000_0000h
IF (mmreg1[63:32]
> mmreg2/mem64[63:32]
THEN
mmreg1[63:32] = FFFF_FFFFh
ELSE mmreg1[63:32] =
0000_0000h
void _stdcall _pfmax(_mmxdata *array1,_mmxdata *array2,int n);
PFMAX is a
vector instruction that returns the larger of the two single-precision,
floating-point operands. Any operation with a zero and a negative number
returns positive zero. An operation consisting of two zeros returns positive
zero.
The PFMAX
instruction performs the following operations:
IF (mmreg1[31:0]
> mmreg2/mem64[31:0])
THEN
mmreg1[31:0] = mmreg1[31:0]
ELSE mmreg1[31:0] =
mmreg2/mem64[31:0]
IF (mmreg1[63:32]
> mmreg2/mem64[63:32])
THEN
mmreg1[63:32] = mmreg1[63:32]
ELSE mmreg1[63:32] =
mmreg2/mem64[63:32]
void _stdcall _pfmin(_mmxdata *array1,_mmxdata *array2,int n);
PFMIN is a
vector instruction that returns the smaller of the two single-precision,
floating-point operands. Any operation with a zero and a positive number
returns positive zero. An operation consisting of two zeros returns positive
zero.
The PFMIN
instruction performs the following operations:
IF (mmreg1[31:0]
< mmreg2/mem64[31:0])
THEN mmreg1[31:0] =
mmreg1[31:0]
ELSE mmreg1[31:0] =
mmreg2/mem64[31:0]
IF (mmreg1[63:32]
< mmreg2/mem64[63:32])
THEN mmreg1[63:32] =
mmreg1[63:32]
ELSE mmreg1[63:32] =
mmreg2/mem64[63:32]
void _stdcall _pfmul(_mmxdata *array1,_mmxdata *array2,int n);
PFMUL is a
vector instruction that performs multiplication of the destination operand and
the source operand. Both operands are single-precision, floating-point operands
with 24-bit significands.
The PFMUL
instruction performs the following operations:
mmreg1[31:0] =
mmreg1[31:0] * mmreg2/mem64[31:0]
mmreg1[63:32] =
mmreg1[63:32] * mmreg2/mem64[63:32]
void _stdcall _pfrcp(_mmxdata *array1,_mmxdata *array2,int n);
PFRCP is a
scalar instruction that returns a low-precision estimate of the reciprocal of
the source operand. The single result value is duplicated in both high and low
halves of this instruction’s 64-bit result. The source operand is
single-precision with a 24-bit significand, and the result is accurate to 14
bits. Increased accuracy (the full 24
bits of a single-precision significand) requires the use of two additional
instructions (PFRCPIT1 and PFRCPIT2). The first stage of this increase or
refinement in accuracy (PFRCPIT1) requires that the input and output of the
already executed PFRCP instruction be used as input to the PFRCPIT1
instruction.
The PFRCP
instruction performs the following operations:
mmreg1[31:0] =
reciprocal(mmreg2/mem64[31:0])
mmreg1[63:32] =
reciprocal(mmreg2/mem64[31:0])
void _stdcall _pfrcpit1(_mmxdata *array1,_mmxdata *array2,int n);
PFRCPIT1 is a
vector instruction that performs the first step in a Newton-Raphson iteration
to refine the reciprocal approximation produced by the PFRCP instruction (the
second and final step yields a result accurate to 24 bits). The behavior of this instruction is only
defined for those combinations of operands such that one source operand was the
input to the PFRCP instruction and the other source operand was the output of
the same PFRCP instruction.
void _stdcall _pfrcpit2(_mmxdata *array1,_mmxdata *array2,int n);
PFRCPIT2 is a
vector instruction that performs the second and final step in a Newton-Raphson
iteration to refine the reciprocal or reciprocal square root approximation
produced by the PFRCP and PFSQRT instructions, respectively.
The behavior of
this instruction is only defined for those combinations of operands such that
the first source operand (mmreg1) was the output of either the PFRCPIT1 or
PFRSQIT1 instructions and the second source operand (mmreg2/mem64) was the
output of either the PFRCP or PFRSQRT instructions.
void _stdcall _pfrsqrt(_mmxdata *array1,_mmxdata *array2,int n);
PFRSQRT is a
scalar instruction that returns a low-precision estimate of the reciprocal
square root of the source operand. The single result value is duplicated in
both high and low halves of this instruction’s 64-bit result. The source
operand is single-precision with a 24-bit significand, and the result is
accurate to 15 bits. Negative operands are treated as positive operands for
purposes of reciprocal square root computation, with the sign of the result the
same as the sign of the source operand. Increased accuracy (the full 24 bits of
a single-precision significand) requires the use of two additional instructions
(PFRSQIT1 and PFRCPIT2). The first stage of this increase or refinement in
accuracy (PFRSQIT1) requires that the input and squared output of the already
executed PFRSQRT instruction be used as input to the PFRSQIT1 instruction.
void _stdcall _pfsub(_mmxdata *array1,_mmxdata *array2,int n);
PFSUB is a
vector instruction that performs subtraction of the source operand from the
destination operand. Both operands are single-precision, floating-point
operands with 24-bit significands.
The PFSUB
instruction performs the following operations:
mmreg1[31:0] =
mmreg1[31:0] – mmreg2/mem64[31:0]
mmreg1[63:32] =
mmreg1[63:32] – mmreg2/mem64[63:32]
void _stdcall _pfsubr(_mmxdata *array1,_mmxdata *array2,int n);
PFSUBR is a vector
instruction that performs subtraction of the destination operand from the
source operand. Both operands are single-precision, floating-point operands
with 24-bit significands.
The PFSUBR
instruction performs the following operations:
mmreg1[31:0] =
mmreg2/mem64[31:0] – mmreg1[31:0]
mmreg1[63:32] =
mmreg2/mem64[63:32] – mmreg1[63:32]
void _stdcall _pfi2fd(_mmxdata *array1,_mmxdata *array2,int n);
PI2FD is a
vector instruction that converts a vector register containing signed, 32-bit
integers to single-precision, floating-point operands. When PI2FD converts an
input operand with more significant digits than are available in the output,
the output is truncated.
The PI2FD
instruction performs the following operations:
mmreg1[31:0] =
float(mmreg2/mem64[31:0])
mmreg1[63:32] =
float(mmreg2/mem64[63:32])
void _stdcall _pfmulhrw(_mmxdata *array1,_mmxdata *array2,int n);
The PMULHRW
instruction multiplies the four signed 16-bit integer values in the source
operand (an MMX register or a 64-bit memory location) by the four corresponding
signed 16-bit integer values in the destination operand (an MMX register). The
PMULHRW instruction then adds 8000h to the lower 16 bits of the 32-bit result,
which results in the rounding of the high-order, 16-bit result. The high-order
16 bits of the result (including the sign bit) are stored in the destination
operand.
The PMULHRW
instruction provides a numerically more accurate result than the PMULMH
instruction, which truncates the result instead of rounding.
This example shows a complete example of the usage of this instructions.
#include
<stdio.h>
// Always include
the mmx header!
#include
<mmx.h>
//***********************************************
// Calculate the
squares of 8 floating point numbers stored in an
// mmx data vector.
Each member of the array contains 2 floats.
//***********************************************
int main(void)
{
_mmxdata data[4];
int i;
// Fill the array
for (i=0; i<4;i++) {
data[i].Floats.high =
(float)i*2;
data[i].Floats.low =
(float)(i*2+1);
}
// Execute the multiplication
_pfmul(data,data,4);
// Always finish the MMX state before
calling any external
// function like printf
_emms();
// Display the results
for (i=0; i<4; i++) {
printf("%d
%f\t",i*2,data[i].Floats.high);
printf("%d
%f\n",1+i*2,data[i].Floats.low);
}
return 0;
}
The output of this
program is:
0 0.000000 1 1.000000
2 4.000000 3 9.000000
4 16.000000 5 25.000000
6 36.000000 7 49.000000